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On Maxentropic Discrete Stationary 
Processes 

By D. SLEPIAN 

(Manuscript received September 24, 1971) 

This paper is concerned with the following mathematical problem. 
Let X denote a stationary time-discrete random process whose variables, 
• ■ ■ , X-i , X , Xi , ■ • • , take values from the finite set oj real numbers 
{ Xi , x 2 , • • ■ , x K } . Let X have mean zero and a given covariance sequence 
Pk = EXjX i+k , j, k = 0, ±1, ±2, ■ ■ ■ . What is the largest entropy 
that X can have and what is the probability structure oj this most random 
process of given second moments? 

I. INTRODUCTION 

Let X denote a stationary time-discrete random process whose 
variables, • ■ • , Z_, , X Q , X, , • • • , take values from the finite set of 
real numbers { x x , x 2 , • • • , x K ) . Let X have mean zero and a given 
covariance sequence Pk = -EX,X, +t , j, k = 0, ±1, ±2, • ■ ■ . What 
is the largest entropy that X can have and what is the probability 
structure of this most random process of given second moments? 

Our interest in this question arose from the consideration of certain 
pulse-type communication systems used for the transmission of digital 
data. In such systems, a customer provides data in the form of an 
infinite sequence of binary digits that can be represented by a stationary 
process Y whose variables, • • • , F_! , Y , F, , • • • , are independent 
random variables each taking values zero and one with equal prob- 
abilities. An encoder transforms Y into a X-level process X of the sort 
described above, whose random variables are then used as amplitudes 
for successive pulses of a train. The transmitted signal is thus of the 
form 

8(0 = f; X„g(t -nT + 6) (1) 

where g(t) is the pulse shape and T > is the pulse repetition period 

629 
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of the system. It is easy to compute that the power density spectrum 
of the stochastic process (1) is given by 

*.<fl - {j y^ **<m ( 2 ) 

where G(j) is the Fourier transform of g(t) and 

*.(/) - S P«e 2Tinf (3) 

-co 

is the spectrum of the discrete-amplitude process X. Here it has been 
assumed that is uniformly distributed in (0, T). 

Many different encoding schemes for mapping the customer's data 
stream Y onto the pulse amplitude stream X have been proposed 
in the past. Typical are dicode, partial response, pseudo-ternary, 
run-length-limited codes, etc.. Entry to the literature on this subject 
can be made through Refs. 1-4. In general, these encoding schemes 
are employed to give <!>*(/), and hence $.(/), some desirable shape 
that will be particularly well-suited to the transmission medium, the 
noise, and the demodulation process. However, such deviations of 
<£ x (/) from a flat shape (<S> = constant) are bought at the price of a 
decreased information rate for the system as will be seen in an example 
below. Solution to the problem posed in the opening paragraph would 
yield the maximum information rate possible with given amplitudes 
Xi , x 2 , • • • , *k and given spectrum <J> X (/). 

To illustrate these matters, consider the simple case of dicode for 
which the encoding is 

X n = Y n - r n _j , n = 0, ±1, ±2, • • • . 

Here K = 3 and the allowed pulse amplitudes are x t — 1, a? 2 = 0, 
x 3 = —1. It is readily computed that for this amplitude-process p = h, 
Pl = p _j = — i, and p„ = for n = ±2, ±3, • • • and so <J> dicodc (/) = 
sin 2 71-/. This spectrum vanishes like f at zero frequency, a frequently 
desirable property. But, this 3-level scheme signals at a rate of only 
one bit of information per pulse whereas a rate of log 2 3 = 1.58 bits 
per pulse could be had by appropriate mapping of the customer's 
binary digits onto independent random variables taking the same 
amplitude values, —1, 0, and 1 each with probability 1/3. This latter 
encoding would, of course, yield a flat spectrum. Thus dicode achieves 
a desired spectrum at the cost of about a 1/3 decrease in rate. Can 
any scheme with the same values and spectrum as dicode attain a rate 
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greater than one bit per pulse? What is the highest rate so achievable? 

We have been unable to answer even these seemingly simple specific 
questions. Quite apart from applications to pulse-amplitude data 
transmission systems, the general question of finding a maxentropic 
finite state discrete process of given second moments is of interest in 
its own right. As we shall see, such a process is a natural finite state 
analog of the Gaussian process and could serve as a convenient model 
in many contexts. We have been able to make only slight progress 
in solving this more general problem. 

It is the purpose of this paper to record the progress we have made 
and the approaches we have followed in pursuing these goals, and to 
exhibit the difficulties encountered as well. It is hoped that others 
who may become interested in this problem can thereby avoid some 
pitfalls and be guided to more successful approaches. 

II. REDUCTION TO THE MARKOV CASE 

Let X be a stationary process • • • , X- t , X , X t , ■ ■ ■ where each X 
takes values from the set of K real numbers [xi , x 2 , • • • , Xk], We 
denote the probability distribution of n successive Xs by 

p.(«, , • • ■ , e„) - Pr [X i+1 = x tl , • • ■ , X i+n = x tH ). (4) 

Here each index e, , e 2 , • • ■ , e„ takes values from the set { 1, 2, • • • , K}. 
We have, of course, 

£ p„(€, ,€,,-••, e„) = 1 (5) 

and 

p„(e, , • • • , e„) ^ 0, e lf e 2 , ■■■ ,e n = 1,2, ■■■ ,K. (6) 

The stationarity of X implies that the left of (4) is independent of i, 
and furthermore that 

K K 

a=l o-l 

= p n _,(e, , • • • , €„_,), «, , e 2 , • • • , e,,-! = 1,2, ■■■ , K. (7) 

The statements (4) through (6) are to hold for n = 1,2, • • ■ and (7) for 
n = 2, 3, • • • . Note that if (7) holds for n = n , this implies the validity 
of (7) for n - 2, 3, • • • , n a . 

The entropy of X is defined by 
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jff(X) = lim-tf„(X), (8) 



n 



H n (X) = - E P»(«i » • • • . € ») lo 8 P«( 6 ' » " " • i Oi (9) 

where the sum is over the i£" allowed values of the e's. We seek to 
maximize (8) by suitable choice of a hierarchy of distributions 
p„(ei , • ■ • , e»)> n = 1, 2, • • • that satisfy (5), (6), and (7) and the 
constraints 

EX< = 2 Xtffifa) = 0, (10) 

EXjX i+k = S ^f^ii+iPA+iC*! i ■ ' * 1 6 *+0 = p* > 

fc = 0, 1,2, ••• . (11) 

Here the p k are given and the sum is over all allowable values 

of ei , e 2 , • • • , 6 t+ i . 

We do not know how to proceed directly with this problem. One 
approach is to attempt to solve the problem when the constraint (11) 
is imposed only for k = 0, 1, 2, ■ • ■ , L. That is, we seek the process 
of maximum entropy whose first L + 1 covariance elements are pre- 
scribed. Let H u (X) denote this maximum entropy and let 
pi t} («i ■•• » O, n = 1, 2, ■■• , be the corresponding distribution. 
We would then investigate the behavior of these quantities as L -* « . 
We have, of course, H iL) (X) ^ H(X). 

In Appendix A we establish 

Theorem 1: The K-valued stationary discrete process of largest entropy 
with mean zero, given values x x , ■ ■ ■ , x K , and given values of 
Po , Pi , • • ' i Pl is an Lth-order Markov process. 

An Lth-order Markov process is characterized by the fact that 
Pr \X n = x tn | X„_, = *•„,_, , • • • , X n - L 

= *t_ L ) Xn-L-1 = X<n-L-i ) • ' • } 

= Pr (I„ = x fn I X„_, = &«„_, , • • • , X n - L = x ( „_ L \ 
for all n and all allowable values of the e's. A stationary Lth-order 
Markov process can be specified by K L+1 transition probabilities 

Ql^l+i I «i » " ' " t *l) 

= Pr [X L+l = x, L+1 | Xt = x tx , ■ • • ,X L = x tL ] 

€l , • ' • , CL + 1 = 1» 2, ' • • , K 
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and a corresponding Lth-order distribution p L (e t , • • • , e L ) that satisfies 

K 

2 Ql{*l + \ I «i , • • • , et)Pi(ei , • • • , e L ) = p/.(e 2 , ■ ■ • , e L+i ) 

Cl-l 

€ 2 , ■•• ,e, +l = 1,2, ••• , K. (12) 
We have, of course 

gx(« i+1 | «i, ••-,«*)£ 0, (13) 

K 

2 </*.(« I «i i • * " » «t) = 1> «i,---, ct+i = 1,2, • ■ • , K. (14) 

o = l 

Equations (12) and (14) guarantee that the normalized solutions p L 
of (12) have property (7) (with n = L). The general term p n of the 
probability distribution for such a process is given in terms of p L by 
the product rule 

Vn(*\ i ' ' • , O = Pa(«i i ' • • » €/.)(7i.(ei. + i I «i i • • ■ , «0 

?i(«i + 2 | «2 , " ' ■ j */.4-l) • • • tf/-( f « | *„-/. , tn-L + l , • ■ ■ , «n-l) (15) 

for n > L. For /* < L, 

K K 

Pn(*l , • • • , O = S • • • 1] P/-(«i P , ",«.,*,'"i «i-n)- (16) 

Oi=l BL-.-l 

It is easy to show that for a stationary Lth-order Markov process the 
entropy (S) through (9) is given by 

H = - £ Pi( £ i . • ■ ■ > «i) 2 (//-(« I «i , ■ • • » «t) log g/, (a I «i , • • ■ , e L ) 

= — 2 Pi, + l(«l > ' " ' , e /. + l) l°g Pi+l(«t , • • • , 6L + l) 

+ 5Z Pl(«i ,-••,«/.) log pt(e, , • ■ • , £/.) 
- ff L+1 - ffx, . (17) 

III. THE DETAILED DISTRIBUTION 

Now to find the most random stationary Lth-order Markov process 
with given p , pi , • • • , p L . we must maximize H L + l — H L by proper 
choice of the K L+1 quantities p/, + i(e, , • • • , e L + l ) subject to certain 
linear constraints of the form 

£ a,(e, , • • • , 6i + ,)p L+ ,(e, , • • ■ , e L+1 ) = 6i i = 1, 2, • • • ,M. (18) 

i 

We assume this system is of rank .1/' ^ il/. 
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There are two ways to proceed: (i) by the method of Lagrange 
multipliers which treats the unknown p L+1 's symmetrically; (tt) by 
expressing H L+1 - H L in terms of K L+l - M' independent p L+1 's 
obtained by solving (18). Both methods lead to unwieldy higher-order 
algebraic equations with which we have been able to do little in the 
general case. The form of the solutions is not without some interest, 
however, and we record it here. 

To avoid unnecessary superficial complications, we shall henceforth 
assume that if x is one of the allowed values x, , x 2 , • • ■ , x K , then —x 
is also in the set of allowed values. This condition will assure that 

Pr (Xi = x tl , • ■ • , X n m X J = Pr (X 1 = -x tl , ■ ■ ■ , X n = -as..) 

in the optimal process and that EXj — 0, j = 0, ±1, • • ■ . 

3.1 Lagrange Multipliers 

Let us define the sample lag sums 

tPifi. , • • • , O s x) t + x\ + • • • + x] m 

ln\e t , ■■■ , e„) = X tl X tt +, + X t ,X, t +, + • • • + &.„-,««» 

and the function 

fcnfe , • • • , e n ; Xo , X x , • • • , X n _,) = exp £ X,tf% , • • ■ , e„). (20) 

j'-O 

Then the Lagrange solution can be stated as follows. Solve the homo- 
geneous system of equations 

= 1 /(e 1 , ■•• , €i ) «,,«,, ••• ,t* - 1,2, ••• ,X (21) 
c 

for the if L /'s and c. Then the transition probabilities and initial sta- 
tionary distribution of the maxentropic process are given by 

Ql^l-h I e, , • • • , ez.) 

= ch L + i(ei , • ■ • , e L + i ; X , ■ ■ • , X L ) — y 1 - — — , (22) 

/(d , ■ ' ■ , «l) 
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Pl(«i , ■ • • , €l) - fe/fci , • • • , et)/(ez. , • • ■ , «■), 

e t , ••• , « t + l = 1,2, ••• , X\ (23) 

In (23), fc > must be chosen so that the p L sum to unity. A derivation 
of these equations is given in Appendix B. 

While equations (21), (22), and (23) are a formal solution to our 
problem, in practice they are of little value. The solutions p L and q L 
contain the Lagrange multipliers \„ , Xi , ■ • • , X/, in a complicated way, 
and these must be determined to give the prescribed covariance elements 
Po i Pi , ■ • • , Pi, ■ Presumably that eigenvalue c of (21) should be taken 
which gives maximum entropy and yields q L ^ in (22). In the small 
examples we have carried out, p h and q L turned out to be independent 
of the eigenvalue chosen in (21), but we have been unable to prove 
anything in general about this situation. For particular processes, 
say the symmetric binary process with K = 2, x x = 1, x 2 = —I, for 
example, equations (21) take a special simple seductive form that 
suggests the possibility of explicit closed-form solution. We have been 
unable to find one. 

Perhaps the best that can be said for this curious Lagrange solution 
is that (23) shows clearly that in the maxentropic process p L (e l , • • • , e L ) 
= PlUl , • • ■ , €i). It is not hard to see that the product rule (15) 
and the form of (22) and (23) propagate this property so that for 
arbitrary n, p„(ei , • • • , e „) = p„(e„ , • • • , e,). The maxentropic process 
treats past and future in a symmetric manner. 

3.2 The Independent Variable Approach 
We seek to maximize 

J" ™ _ 2 Pi + i(«i i • * ' , e^.) log p i + I (e, , • • • , e i+1 ) 

+ 2 Pt(«i ,■•-,«*) log p L (ei ,---,e ; .) (24) 

by choice of the K L + 1 quantities Pi, + 1 (e, , • • • , e,. + 1 ). Here we define 
Pi(ei , ■ ' ' i •*) ■ 2 Pi + i(ei , ■•■ , *i. ,a), 



e, , ••• , H = 1,2, ••• ,K. 


(25) 


The Pl + i's must satisfy 




2_) Pl+i(«i i ' • * , Ci+l) = 1, 


(26) 


12 P/. + i(«i , ' ■ • , «z , o) - J Pi + i(a, e, , • • • , e t ) = 0, 
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„, ••• ,e L = 1,2, ••• ,K, (27) 
E *.,*._♦**«<* p ' • • . «* + * P* , fc = O, 1, • ■ • , L. (28) 

These K L + L + 2 ■ M equations are of the form (18). We suppose 
that they can be solved for M' ^ M of the p L+ iB in terms of the re- 
maining K L+l - M' = M" ones. We denote these M" independent 
p L+l 'a by the variables ft , b i • • ■ , tw and denote the M' dependent 
Pl+i'b by 171 , • • • , Vm- ■ Thus we write equations (26), (27), and (28) 
in the form 

M" 

*-«,+ Eft*. i= 1,2, ••■ ,M'. (29) 

,=i 

It is convenient also to adopt a single index notation for the p L of 
(25) which we now denote by ft , fa , • ■ ' » f » , where iV = K L . By 
means of (29) the right of (25) can be expressed in terms of the £'s. 
We write 

f, = 5,- + E7*iCi, <-!,«, ••• i**- ( 3 °) 

i-i 

We further note that (26) and (25) imply that 

M • It" N 

Es< + L* = !> Sr«- i- 

Since these are to hold as identities in the £'s we must have 

M- N N M' 

£«, - E *< = 1, £t« - 1 + E/3„ = o, 

1 1 1=1 «-l 

i- 1,2, ••■ , M". (31) 

In this new notation (24) becomes 

Jtf" If* * 

J" = - T, h log fc - £ *< log * + J] f- lo S f • 

i i i 

where the r?'s and f 's are explicit linear functions of the £'s given by (29) 
and (30). Maximizing with respect to the £'s gives 

$£- - a- -l - iog& - £ (i + log *)&, + £ a + log r*)T« • 

On taking account of (31), we find finally 
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Si - ~^~, 3= 1, ••• ,M". (32) 

n r,V 

These are M" equations for the M" £'s. There seems to be little that 
can be said in general about them, and so the trail ends here. (We 
note only that the form of these equations is appropriate for an iterative 
numerical solution: a trial set of ij's used to evaluate the products yields 
directly a new set of £'s.) 

IV. SOME SIMPLE EXAMPLES 

We consider first the binary case and set 

x, = 1, x 2 = -1, (33) 

In this case we must take p„ = 1. 
When L = 1, we find 

p(l, 1) = p(2, 2) = |(1 + P.) 
p(l,2) = p(2, 1) = 1(1 - Pl ) 

H = -i(l + P.) log 10 + P.) - Ml " P.) log J(l - Pl ) 
p n = pT , n. = 0, 1, 2, ■ •• , 
i 2 
1 + pl - 2 Pl cos 2tt/' 
When L = 2, 

p(l,l, 1) = p(2, 2, 2) = |(1 + 2 Pl + p 2 ) 
p(l, 1, 2) = p(2, 2, 1) = |(1 - p 2 ) 
p(l,2,2) =p(2,l, 1) = i(l - P*) 
p(l,2, 1) = p(2, 1,2) = |(1 -2 Pl + p 2 ). 
Let 

a± = o/i 1 ^ [pi(1 ~ Pl>) ± ^4p! + p*(p 2 - 6p 2 - 3) + 4p 2 ]- 
2(1 — Pi) 

Then 

1 



Pn = 



(1 + a+ct-)(a + — a-) 



[(1 -a 2 >r - (1 -a\)a n -+ l ] 



+>aj = (i - pi) d - p .-)(i - -V; + ^ 

1 - pi + 2p\ - 4pip: + pl + pi pi - 2p,(\ - pi)' cos '_'jr/ + •_»(p, — p.)(l - p\) cos 4tt/ 
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When L = 3, we are already in algebraic difficulties. Equations 
(26), (27), and (28) in the present case permit us to solve for all the 
p's in terms of £ = p(l, 1, 1, 1). We find 

p(l, 1, 1, 2) = p(l, 2, 2, 2) = p(2, 1, 1, 1) = p(2, 2, 2, 1) 

= -|(1 + 2 Pl + P2 ) - £ 

p(l, 1, 2, 1) = p(l, 2, 1, 1) = p(2, 1, 2, 2) = p(2, 2, 1, 2) 

= |(1 + Pl + P2 + P 3 ) - $ 

p(l, 1, 2, 2) = p(2, 2, 1, 1) = 1(1 - Pl - 2p 2 - p 3 ) + S 
p(l, 2, 1, 2) = p(2, 1, 2, 1) = |(-3 Pl - p 3 ) + { 
p(l, 2, 2, 1) = p(2, 1, 1, 2) = |(-2 Pl - 2p 2 ) + £ 
p(2, 2, 2, 2) = £. 
On setting Z = 8£, equation (32) becomes 

[1 + 2 Pl + gg - Z] 4 [l + p, + p 2 + p 3 - -g] 4 



Z = 



Z[-3p, - p 3 + Z] 2 [- Pl - 2p 2 - p 3 + Z] a [-2p, - 2 P2 + Z] 2 

One can take the square root of both sides of this equation, clear frac- 
tions, and expand to obtain a cubic equation in Z. It is not hard to 
show that there are no roots rational in p, , p 2 , and p 3 , so that the 
simple dependence of p on the p's exhibited for the cases L = 1 and 
2 fails here. 

We next consider the case K = 3 and choose 

x, = 1, x 2 = 0, x-i = — 1. 

Here with L = 1 we already meet with higher-degree algebraic equa- 
tions. The constraints permit solution of all p's in terms of p(l, 1) = £. 
We find 

p(l, 2) = p(2, 1) = p(2, 3) = p(3, 2) = Kpo + p.) ~ 2£ 

p(3, 1) =p(l,3) = -i Pl +£ 

p(2, 2) = 1 - 2p„ - p, + 4£ 

p(3, 3) = J. 

On setting Z = 4£, equation (32) becomes 

Z = ± (po + Pl ~ Z)4 -r, (34) 

(-2p, + Z)(l - 2p„ - Pl + Z) 2 
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which is a cubic in Z. One finds 

P. = P«.(f). n = 0,1,2,..- (35) 

\pn/ 

quite independent of the value chosen for £. The spectrum is given by 

*-(/) = —. ^"^ - ■• (36) 

Pa + Pi — -ipoPi cos 2tJ 

Using the dicode values p„ = \, p, = —J, one finds from (34) that 
£ = 0.0103. The entropy of the resulting simple Markov process is 
found to be 1.299 bits, which is greater than the one-bit rate of dicode. 
While the first two terms of the covariance sequence agree with the 
dicode values, the higher terms are given by (35) and the spectrum, 
as given by (36), does not vanish for / = 0. 

The case K — 3, L = 2 begins to reveal the complexity of the general 
case. We denote each of the 27 quantities p(i, j, k) by x with a subscript 
ranging from 1 to 27. The association is made by listing the p's in order, 
interpreting (ijk) as a three-digit number. Thus x x = p(l, 1, 1), x 2 = 
p(l, 1, 2), x 3 = p(l, 1, 3), ■■■ ,x 27 = p(3, 3, 3). Equations (26), (27), 
and (28) can be solved to express all the x's in terms of five of them. 
Equations (29) are 

Vi = -I + P. - 1p,> - £i + 3£, + 4£ 3 + & + & 
V* =» i - hx + |p 2 + *i - 2fc - 3£ 3 - & - i£ 5 

V3 = i + ipo- |p 2 - i€. + fc + & 

17.1 = ^Po — Pi + 2Pj + £1 — 4£ 2 — 4£ 3 — £ 4 

r?5 = £ - Ipo + *Pi - ip 2 ~ gi + 2£ 2 + 2£ 3 + fa - & 

t]\ = X\ = .T27 

172 = T 2 = ^10 = #18 = #26 
7^3 = X3 = .Tq = Tjq = .T25 

7/4 = Xi = T 2 4 

T'.'i = T5 = T13 = T ]5 = T23 
£l = #0 = ^22 
£ 2 = X, = .To, 

£3 = -Tg = X\2 = Xm = £20 
£4 = ^11 = #17 

£ 5 = x 14 . (37) 
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Equations (30) become 

f, = -i + I po + lp, - §p 2 - & + fc + fc + & + Ma 

ft - i - *pi + J"* + ft - % - 2 * 3 " & ~ fe 

r 3 = -I + Ipo - |p a - Hi + & + s fc 3 + fe + k £ , 

r4 - 4 - P. + Pi - ** - 2 *' + 4 ^ + <*■ + b + ft. (38) 

where 

f, = p(l, 1) = p(3, 3) 

f 2 - p(l, 2) - p(3, 2) - p(2, 1) = p(2, 3) 

r, = P (i, 3) - p(3, 1) 

{-4 =P(2,2). 
For the case at hand, equations (32) become 

51 = -24-22-4' ^ 2 -8 -8 8 

I7i Vzls ^t^s «?l*?2 ^4 Va 

t 4 _ fljj ^3^4 j.2 _ £[£2 fgfj 

58 " 8 -12 -8 8' ^ 4 „»„-■ „~ 2 „ 2 

??1*?2 *7l TJ 5 1J112 ??3*?4 »?5 



e _ _iii*_sii±. (39) 

« j -1 j -1 

*?1»72 *73*?5 



The right members of these equations can be written in terms of the 
£'s by using (37) and (38). Equations (39) can then be written as five 
multinomial equations in the five £'s. In principle, by using Sylvester's 
method, 6 the £'s could be systematically eliminated to yield a single 
high-order polynomial equation for g, . The other £'s can be similarly 
determined. To carry this out in practice would be a formidable task. 

V. THE COVARIANCE PROBLEM 

We have seen that the maxentropic discrete stationary process with 
given values x t , x a , • • • , x K and given truncated covariance sequence 
Po , Pi , • • ■ 7 Pl is an Lth order Markov process. In Section III a formal 
solution was given to the problem of determining the complete prob- 
ability structure of this process. This structure in turn determines 
the remaining elements p £+1 , p L+a , • ■ ■ of the covariance sequence. 
It is shown in Appendix C that for a /^-valued Lth order Markov 
process the covariance sequence can always be written in the form 
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P„ = LM, n = 0, 1,2, ••• . (40) 

Thus, only a restricted class of covariance sequences, those expressible 
as a finite sum of exponentials, can be obtained by our procedure. 
The dicode covariance, p ■ }, p! - -}, p„ = 0, n = 2, 3, ■ • • , is 
excluded, for example. 

This raises an important pertinent question that we have side- 
stepped thus far: what are the possible covariance sequences for a 
discrete stationary process taking values j-, , x 2 , ■ ■ ■ , x K ? When the 
restriction on allowed values of the process is removed, one has the 
elegant Bochner theorem that characterizes the covariance sequences 
as Fourier cosine-series coefficients of non-negative finite measures, 
that is, as non-negative definite sequences. No comparable description 
is available for the proper subset of these non-negative definite sequences 
that comprises the covariance sequences of processes restricted to the 
values jt, , x 2 , • • ■ , x K . The situation has been discussed by 
B. McMillan 7 and L. A. Shepp. 8 

More germaine to our discussion, and less ambitious, is the question: 
"What sequences of L + 1 numbers, p , pi , ■ • • , p L , can be the first 
L + 1 terms of the covariance sequence of a discrete stationary process 
taking values j-, , • • • , x K ?" If we consider such a sequence as a point 
in S,. + 1 , Euclidean space of L + 1 dimensions, then the region (R of 
admissible points is a convex one bounded by fewer than 2K L hyper- 
planes. This is shown in Appendix D. Such a region can be characterized 
as the convex hull of its extreme points, or vertices (finite in number), 
and a convenient description of the region is a list of these vertices. 
An alternate economical description is a list of the hyperplane boundaries 
of <R. We have been unable to sort out the combinatorics involved, 
even in the simple case K = 2, Xi = I, x 2 = —1, to provide such lists 
for arbitrary values of L. 

It is to be expected that the formalisms of Section III will have 
solutions Pi. + i(e, , • • • , e/. + i) that are probability distributions if and 
only if p„ , p, , ••• , p,, is contained in (R. In the few cases that we have 
been able to carry out in detail, this is indeed the case. For example, 
from Section IV, we see that the solution presented is valid only if 

1 + 2 Pl + P2 ^ 0, 
1 - 2 Pl + p 2 ^ 0, 

1 - P2 ^ 0. 
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These inequalities do indeed describe the region of admissible values 
of p! and p 2 for a process having values +1 and —1. 

VI. THE UNIFILAR MARKOV MESSAGE SOURCE 

The dicode process is not an Lth order Markov process for any L. 
It can, however, be described simply in terms of a two-state Markov 
chain. Consider the chain with states £ t and S 2 and transition prob- 
abilities \ as shown in Fig. 1. Along each transition path in the figure 
is an associated number enclosed in a box. When the chain passes along 
a path from one state to another, the associated number is "emitted." 
The sequence of emitted numbers is the dicode process. 

The foregoing is an example of a class of discrete processes which 
we call unifilar Markov message sources. An ergodic Markov chain 
with states S, , S 2 , • • • , S N is given along with the transition prob- 
abilities pa = Pr {next state is £,• | last state is £,}. Associated with 
each pair of states *S, , S t for which p t1 > is a number X(S { , Si) 
that is emitted when the chain passes from S, to S t . The word unifilar 
refers to the fact that we demand that whenever S t ** S k , 
then X(Si , Si) * X(S { , S k ), i = I, 2, • • • , N. If this condition is 
met and the initial state of the chain is known, the sequence of emitted 
letters determines the sequence of states followed by the chain and 
a simple formula is available for the entropy of the emitted X process, 
namely 

Hm ~X;pOS,)p,,logp,,. (41) 

» ,i 

(See Ref. 9, p. 68.) Here p(S,-), the probability that the chain be in 

state Si , is the stationary measure satisfying 

E P(5.-)P.-,- =p(Si), J- 1,2, ••• ,N. 




Fig. 1 — Diagram of a two-state Markov chain with states Si and S 2 and transition 
probabilities 1/2. 
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It is easy to write an expression for the covariance sequence of a 
unifilar Markov message source. When n ^ 2, we have 

Pn = E X{S i , SMS,)VuPt^VkiX{S k , S t ) (42) 



where j) { "k is the probability that the chain be in state S k after n tran- 
sitions given that it started from <S, . We have also 

ft = £ X(B t , S i )p(S i )p ii p ik X(S i , S k ), (43) 

i.i .k 

Po = E Kft . SfoWPv ■ (44) 

Since p ( "/ has an expression analogous to (75), in Appendix C, equation 
(42) can be written 

Pn - E 4ri , » - 2, 3, • • • . (45) 

Comparison with (40) shows that the covariance sequences achievable 
are of almost the same form as for the Lth-order Markov processes. 
For the unifilar Markov message source, deviation from the sum of 
exponential form may occur for p and p { . 

To find the unifilar Markov message source of largest entropy with 
N states and given truncated covariance sequence appears to be a 
most difficult problem. We have not found a unifilar Markov source 
with values 0, ±1, with the dicode covariance sequence and an entropy 
greater than unity. 

VII. THE N-VARIATE GAUSSIAN ANALOGUE 

Closely related to the problem we have been discussing is the following 
question. Let X x , X 2 , • • • , X„ be n random variables each taking 
values from the set x x , x 2 , ■ ■ ■ , X K . What distribution for the Xs, 
Pn(«i i ■ ■ ' > fj say, has maximum entropy and given second moments 
EXiXj = pa ? Using Lagrange multipliers, one finds at once that 

p B (e, , ■ • • , e„) = c exp <-- E c,,.r,,.r ( .j- (46) 

I . fi - E exp \-l E * tt x t1 p.\ (47) 

C t k — i . i = 1 J 



Here 



c 

and the a'g must be determined so that 
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2 dS 

p "~ "s dffii 

= c 2 *«*.i ex P {-s E »«*.«*.»} . *'» i - 1, • • • ,»• (48) 

The entropy of (46) can be written 

H = - 2 P"(*i » ■ • ' i O lo 8 P"( € i » " " ' i O 

c 

- log a + \ 2 p«*« • ( 49 ) 

The analogy of (46) with the n-variate Gaussian density is striking. 
Let F t , F 2 , • • • , Y n be n real-valued random variables having prob- 
ability density p n (?/i , ft , ■ ■ * , ?/»)• Let #F,T, = p., . Under these 
constraints, the density having largest entropy is the Gaussian density 

Mvi » • • ■ i I/n) = 6 exp {-* 2 *i/y#y#l (5°) 

Here 

1= § = I" dyi ■■■ [ dy n exp I -§ 2 **/VM) ( 51 ) 

and the it's are related to the p's by 

'•. = -!—. <-i- 1.2. •••-»• < 52) 

The entropy of (50) can be written 

ft = - f dy x • • • ( dy n pn(yi , • • • , Vn) log Mvi , • • • > V*) 

J— oo J-00 

= log s + I 2 p«i'« • (53) 

Note the complete parallel between (46) through (49) and (50) through 

(53). 

In the case of the Gaussian density, the integral (51) can be performed 
to yield the simple expression 

S = &f£ (54) 

where | & \ is the determinant of the matrix with elements & ti . Equa- 
tion (52) then gives at once the well-known results p,, = &" t ) or & = p , 
where we use obvious matrix notation. Surprisingly, the o\, are rational 
in the p,, in spite of the more complicated nature of the dependence of 
S on the <r,, , as given by (54). 
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In the discrete case, matters are not so simple. For example, when 
n = 3, K = 2 and x, = 1, .r 2 = — 1, 



S = 2e -> ( '"-" +ff "Vi"" + '" + ' »> +e + *« 






_l_ g-it-*.. +»..-»..) _|_ g -l (-»..-». ■+#.,>■ 



One finds 



o-i2 = — i log 

i i„.AA 

^13 = "a lOg 

AA 

1 1™ ^ 

^23 = - | lOg 

AA 

c = t[AAAA] J 



where 



A = 1 + P12 + p I3 -f- p 23 

/3l- = 1 + Pl2 — Pl3 — P23 

/3 3 = 1 — P12 + pi 3 — p 2:! 
A = 1 — Pij — Pl 3 + p 2 ;i • 

Thus <r l2 , ffia , and a 23 are not rational in the p's. (The quantities a n , 
<r 22 , and <r 33 can be chosen to be zero in this binary case.) The prob- 
abilities themselves, however, turn out to be linear in the p's. One has 
p a (l, 1, 1) = p 3 (2, 2, 2) = |/3, , p(l, 1, 2) = p(2, 2, 1) = £/3 2 , p(l, 2, 1) 
= p(2, 1, 2) = -2/3, , p(l, 2, 2) = p(2, 1, 1) = i/3 4 . When n > 2, the 
p's become algebraic in the p's. 

VIII. CONCLUDING REMARKS 

In addition to the methods discussed here, I have pursued several 
other attacks on the problem at hand. All approaches seem to end 
in unmanageable algebraic complexities. Perhaps it is the nature of 
the problem; perhaps the answer can't be stated in simple terms. The 
mathematician, whose pleasure it is to make order out of chaos, will 
likely disagree. He will feel that surely so basic a construct as we con- 
sider here must be simple at heart and that Ave have only failed to 
find the appropriate language to make it so appear. In analogy, to 
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the uninitiated, the relationship found in the last section between p 
and <r, namely & = p~\ must surely at first have appeared formidable. 
The matrix language of Cayley brings us apparent order here. Can we 
find the right point of view in which to describe the discrete maxentropic 
process? 
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APPENDIX A 

We are concerned with maximizing (8) subject to (5), (6), (7), (10), 
and the L + 1 constraints 

E • • • E V.mJ^i(«i » • • " . e **i) - * . * - 0, 1, • • • , L. (55) 

We proceed by maximizing H n of (9), subject to these same constraints, 
for each n > L + 1. 

Observe now that (55) and (10) can be stated solely in terms of 

Pi. + i ■ 

E ■•• E *..aWJfc+i(« i ■ " • ,«*+i)-P* . fc = 0,l,--,I (56) 






E *.j»*«(«i , ••■ ,•*«) = °- ( 57 > 



Thus the maximization can be carried out by first maximizing H n 
subject to (5), (6), and (7) giventhe K L+l quantities p L+1 (e, , • • • , e L+ i), 
ei , ■ ■ ■ , e L + i = 1, 2, ■ ■ • , K, then maximizing further over these 
quantities subject to the additional constraints (56) and (57). For 
this first maximization problem, the constraints are (5), (6), (7), and 

= E p«( a i > e i > ■ ■ ■ > ez - +i > tt2 ' ■ ■ ■ ' ""-'--•) 



= E p-fci i ■ ■ ■ > a »-i-i i «i i • • • i e '--»-') 

«., ••• ,«l« -1,2, ••• ,K (58) 
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where we regard the p h + l as given. These quantities, of course, must 
themselves satisfy (5), (6), and (7) with n = L + 1. 

Introducing Lagrange multipliers we seek the maximum of 

J = ~ L P»(«i , • • • , O log p„(e, ,■•■,€») + X £ p n (c, ,-•-,€„) 

j=0 a.c 

p^a, , • • • , a,- , 6, , • • • , e^+i , a j+1 , • • • , «„_/._,) (59) 

where in the last sum for j = the first argument of p n is ei and for 
j = n — L — 1 the last argument of ?>„ is e t + i . In (59) we have omitted 
terms corresponding to the constraint (7). It turns out that this con- 
straint will be met automatically. Differentiation of (59) with respect 
to p n («i ,-•-,€„) yields 

- 1 - log p„(e, , • • • , e„) 

+ x + C„. 1+1 + ,«>... ilti + • • • + rSii:". = 



or 



p»(* , • • • , O = c exp {C-<, + , + C. ( , +I + • • • + ^TJT.'.'.J (60) 

where c, is independent of the e's. Equation (58) and (5) servo in principle 
to determine c and the K L+1 (n — L) Lagrange multipliers «*"!.. €£+1 . 
Note now that from (60) we find that 



Pr {X n = x ln | X\ = x tl , • • • , I.., = *,._,} = 



P»(«l ,-••,€„) 



]£p«(«i , ■•• , «n-1 ,a) 

o = l 

= exp (^li'-Vj / £ exp (,:i:t:.'> _ Q ) - /„(e n _, , e„_ L+1 , • • • , e „), 

since of the p's in (60) only „ c — '-" involves e„ . Similarly, for each m 
satisfying L + 1 ^ m ^ ?i we find from (60) that 

Pr |Z m = .r (m | X, = x tl , ■■■ , X m _i = z...,} 

= Pr {X, = x tx , ••• , X m = x lm ] 
Pr \X\ = x f , , ••• , X m _, = £<_} 

= f m (e m -i. , e ffl _ i+1 , • • • , «„), L + 1 ^ m ^ n, (61) 

that is, this conditional probability depends only on e m _ L , e m _ L+ , , • • • , 
e m . Writing (61) in another form, 
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Pr JX, = X fl , ■•■ , X m = X (m \ = f m (e m -L , ■■■ , 6m) 

Pr [Xi = x u , ■■• , AVi = ».— J 
and then summing on e L , e 2 , ■ ■ • , e m _ L _i , we find that 
Pr \X m - L = x tm _ L , • ■ ■ , X m = x fm \ = 1 m (e m -L , • ' • , O 

•Pr {Z m _ L = x era _ t , ••• ,X m _, -x..-,}. (62) 
We have then, substituting the value of /,„ from (62) into (61), 
Pr [X m = x Cm | X, = x l% , • ■ • , AV, = z.,.,1 
= Pr }X W - L = s.-,, , ■■■ ,X, n = x tm \ 

Pr \X,„- L = x lm _ r , , ••■ , X..1 = x (m _ t \ 
= p r (x ra = a:.. | X»-i = &.„_, , •■■ , X m - L = .x\ m _J, 

L + 1 ^ m ^ n. (63) 

Let us now define 

a: 
Pl(*i , •■■ ,</-) = EPi + i(«" , •■' ,«t ,a) (64) 

a = 1 
/ I N ?>/. + l( € ' I ' ' ' I 6 ^ + ') /flK\ 

PU«i , ■ ' ' , «n) 
Repeated application of (63) then shows that 
Pr !A\ = x (1 , ••• ,X m = O 

= P/.(ci , ' ' • i «/-)?( e /. n I «i i • ■■ 1 O 
■^(ei+a | «2 , ' • • , 6 i+ i) • • • tf(e m I € »<-'< » e »-t+i i ' ' " ' *»-i)t 

m = L+ 1,1/4-2, ■■• ,n. 

This expression is independent of n. Thus, among stationary processes 
with a given (L + l)st-order distribution />,. + ,(e, , • • • , e„), the Lth- 
order Markov process generated by the initial distribution (64) and 
the transition mechanism (65) has maximum entropy H n for 
every n ^ L + 1. Q.E.D. 

APPENDIX B 

We seek to maximize H L + i - H L by choice of the p £+ i(e, , ■ • • , e i+ i) 
subject to the stationarity constraints 
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2^ p/. + i(e, , • • • , €/, , a) 

= 2 Pi+j(«. e, , ■ • • , ej e, , • • • , e L = 1, • • • , K, (66) 

the distribution constraint 

Z)pz+i(«i , ••• , e*, +I ) = 1, (67) 

and the covariance restrictions 

Z2 &l+i(«i , ■ • • , Cl + i)P/. + i(ci , • ■ • , C/. + i) = (L + 1 — ;')p, 

J = 0, 1, ••• ,L (68) 

where the l\H x are defined in (19). The constraints (68) treat the variables 
in a more symmetric manner than do the constraints 

S K, 1 aj„ +l p i+ ,(€, , •• • , 6x, +1 ) = Pk , k = 0, 1, • • • , L. (69) 

It is easy to show that (66), (67), and (68) are equivalent to (66), 
(67), and (69). 
Introducing Lagrange multipliers, we must maximize 

J' = -Spz. + i(ei , • • • , €/.m) logp L + l (e, , • • • , e,. + .) 

+ Jlv-^i^ , •■ • . c/.h) log [£p t + 1 (e, ,-••,€/.,«)] 

+ J2 ".,...,,.[ £ Pi.^Ui . • • • ,€ t ,a) — 2 Pz.+i(ai«i » •■• i «/.)] 

+ M Z)Pfi + i( € i i '•" i «L+i) 

/. 
+ J2 '\ 2Z 2i + i(«| , • • • , 6i + ,)p£ + l(«l , ' ' * , «/, + l). 

Differentiation with respect to pi+i(«i , • • • , e L + 1 ) gives the necessary 
condition 

-1 - logp,.,,^, , • •• , e,. M ) + 1 + log 5]p i+ i(«i , ••• , e L ,a) 

i. 

+ *......,. - •»,... .. t+I + m+ EMi'Jife , ••• ,e, + l ) = 0. (70) 

j -it 

With the notation c = e", /(«, , • • • , e,.) = *-'«i—»* and the definition 
(20), (70) becomes (22). 



650 THE BELL SYSTEM TECHNICAL JOURNAL, MARCH 1972 

Inserting (22) in (14) yields (21). Again using (22) for q L in (12) 
gives 

^ , . > Pl(€i , • • • , 6 L ) _ 1 Pl(c2 , ■•• , *L + l) 

/(e, , •■■ ,tt) c /(C2 , • * ' , Bi+iJ 

where for simplicity we suppress the X dependence of /i. Relabeling 
variables, this can be written 

^ , ( v Pl((:l+i , ■ • ■ > fr;) _ 1 Vl(cl , • ' ' 1 gi) 

«t+x /(cl + i , • • • , «2) c K*l , •• • , «J 

4, ...,«*- 1,2, --- ,X. (71) 

But from the definition (20) and (19), /iL+ita+i , ■•■ » «i) = 
h L + i(ei , ■ ■ • , e L+ i). Equation (71) is then 

23 h L+} (ei , • • • , e/. + i ; X , • • • , X L ) 

Pl(*L + 1 , • • • , € 2 ) __ 1 Pl(cx, » ' ' • 1 *l) 

/(e/.+i , • • ' , e 2 ) c /(«l ,•••,«!) 
Comparison with (21) now shows that we must have 

VlW ' • • • l 6l) = A#, ,■■■ ,e L ) (72) 

f(«£ , * ■ ■ , 61) 

if the eigenvalue 1/c is not degenerate, which is the general case. A 
change of notation reduces (72) to (23). 

appendix c 

We have considered stationary 7,th-order Markov processes 
• • • X_i , X Q , X, , • • • whose variables take values x v , x 2 , • • • , x k . 
The probability structure of such a process can be generated from 
transition probabilities g £ (c £+ i | e x , • • • , e £ ) via the mechanism of equa- 
tions (12) through (16). Such a process can also be regarded as a function 
on the states of an ordinary Markov chain. The chain has K 1 ' states, 
each one labeled by an Ltuple of integers a ~ (ai , a* , • - • , a L ). The 
conditional probability that the chain pass in one move from state o 
to state (5 is given by 

P((? I «) = Ql(0i. I oti , «2 , • • • , a05 O2 p,5 a3/ 9 3 ■ ■ ■ & aL fiL-, ( 73 ) 

where 5,-, is the Kronecker symbol. The chain thus has a very special 



MAXENTKOPIC DISCRETE STATIONARY PROCESSES 651 

nature. One can pass only to a state whose initial L — 1 labels agree 
with the last L — 1 labels of the state just left. The states correspond 
to successive Ltuples, A", , A', + 1 , • • ■ , X J+i _i , in the Lth-order Markov 
process. We regain that process from the chain by defining on the 
states of the chain the numerical valued function 

A» = x at . (74) 

From well-known results in the theory of finite Markov chains 
(see for example Ref. 10, Chapt. 16, Sec. 1), we see that the probability 
of going from state a to state (3 in exactly n moves can be written in 
the form 

p«0H«) - XX'X'^ n= 1,2, ••• 
/-i 

a, ,a 2 , ••• , a, , 0, , ••• , L = 1,2, ••• ,K. (75) 

Here the u's and v'h are left and right eigenvectors of p(g | a), 

E i4 n P ($ | «) = m ( ." E P (« 1 5K' J = *y„" 

j = 1, 2, • • • , X', «! , a 2 , • • • , a L = 1, 2, • • • , K, 
normalized so that 

j-i 
Note that this gives (75) the special value 

V m (Q\a) = 5 o3 . 

In terms of this Markov chain, the covariance of the X process 
can be written 

p„ = £1,1,,,, = E E .r.,,iv 1 p(a)p , " , (5 | a) n = 0, 1, 2, • • • (76) 

« » 

where ;)(a) is the stationary distribution for the chain, i.e., 
E?'(«M0 | a) = p(0). 

Using (75) in (76) we have the desired result 

P„ = E ••M", n = 0, 1,2, •■• 
where 
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APPENDIX D 

For any discrete stationary process taking values x% , x 2 , ■ ■ ■ , x k , 
the truncated covariance sequence can be formed from the (L + l)st- 
order distribution PL+iUi , ■ • * > «/. + 0- Thus 

Pi = 2 b«.e«,+,Px,+i(«i , ' • • i «i+0 i = 0, 1, • ■ • , L. (77) 

c 

Such a distribution satisfies the stationarity conditions 
2 Pl + i(«i «2 , " " * > «t + i) = 2 Pz, + i(f2 , • * • i «l + i , a) 

e 2 , «3 , • • • , «l + i = 1,2, • ■ ■ , K, (78) 

the constraint 

Ep i+ .fe , ••• ,e L + 1 ) = 1, (79) 

and the inequalities 

Pi + i(«i , " • ' , «£+i) = d 1 «■ 1 ' * • j e f. + i - 1» 2 > " - ' t ^- ( 80 ) 

Conversely, from tf t + I quantities p L+1 («i , •■• , e L+1 ) satisfying 
(78) through (80) we can construct a discrete stationary (Lth-order 
Markov) process having values x, , • • • , x K and truncated covariance 
given by (77). To do so, define 

Pl(«i , • • • , e /.) — Z) P^ + l( e l l ' • * i «/. + ■) 

e, , e 2 , • • ■ , e L = 1,2, • • ■ , K. 

Let 

/ i \ Pz. + i(fi > • • ' > 6 & + i; 
5/,(*i.+i I «i , • ■ • , «l) = ; : — 

Pi(«l , • ' ' , « £.) 

It is easy to verify that (12), (13), and (14) are satisfied, so that the 
measure described by (15) and (16) defines the desired process. 

Thus equations (77) through (80) serve to define parametrically the 
region <R of admissible truncated covariance sequences. Consider an 
(L + l)st-order density p L +i(ei , • • • , «*+0 as a point in a Euclidean 
space S K t+. of dimension K L+l . Equations (78), (79), and (80) define 
a convex region V in this space that is bounded by no more than 
K L + 1 + K Ltl ^ 2K L+1 hyperplanes (K ^ 2). Equations (77) provide 
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a linear mapping of S A - i + . into 8 £+ i and in particular the image of V is (R. 
The hyperplane boundaries of 13 map into hyperplanes in & L + l that 
include all the hyperplane boundaries of (R. Q.E.D. 
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